Skip to content

feat: add rdf n-tripples graph load source#7

Open
VanyaGlazunov wants to merge 2 commits intomainfrom
feat-add-rdf-graph-load
Open

feat: add rdf n-tripples graph load source#7
VanyaGlazunov wants to merge 2 commits intomainfrom
feat-add-rdf-graph-load

Conversation

@VanyaGlazunov
Copy link
Collaborator

Adds support for loading graphs from RDF N-Triples format as a new edge source.

closes #3 I think.

@VanyaGlazunov VanyaGlazunov force-pushed the feat-add-rdf-graph-load branch 2 times, most recently from e470334 to 9b98673 Compare March 19, 2026 02:54
#[derive(Debug, Clone, Default)]
pub enum LabelExtraction {
/// Use only the local name: the fragment (`#name`) or last path segment.
/// For example, `http://example.org/ns/knows` → `"knows"`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// For example, `http://example.org/ns/knows` → `"knows"`.
/// For example, `http://example.org/ns/knows` → `knows`.

I don't think there should be any quotation marks here so as not to confuse it with literals.

#[default]
LocalName,
/// Use the full IRI string as the label.
/// For example, `http://example.org/ns/knows` → `"http://example.org/ns/knows"`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

}
}

impl<R: Read> Iterator for NTriples<R> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<...> <http://a.org/knows> <...>
<...> <http://b.org/knows> <...>

We lose info about difference between these two predicates if use LocalName. Is it good?

} else if let Some(pos) = iri.rfind('/') {
iri[pos + 1..].to_owned()
} else {
iri.to_owned()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we obtain something like <...> <http://a.org/> <...>? In this case predicate will be empty "". Should we handle this case?

@suvorovrain
Copy link
Collaborator

@VanyaGlazunov Nice. Can you answer on my questions in review?

}

#[cfg(test)]
mod tests {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you, please, introduce a test involving non-ascii chars. Especially I am interested in Chinese and Cyrillic support.

Copy link
Member

@georgiy-belyanin georgiy-belyanin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this! Consider, please, my comment and Rodion's one.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new RDF N-Triples format parser and wires it into the in-memory graph builder so graphs can be loaded directly from .nt sources (per Issue #3).

Changes:

  • Introduces formats::nt::NTriples<R> iterator (backed by oxttl) to parse N-Triples into Edge items.
  • Adds GraphSource<InMemoryBuilder> support for NTriples<R> and a corresponding in-memory loading test.
  • Updates formats module exports and repo documentation to include the new format.

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/graph/mod.rs Minor formatting adjustment in check_graph signature.
src/graph/inmemory.rs Imports NTriples, implements GraphSource for it, and adds an in-memory load test.
src/formats/nt.rs New N-Triples parser/iterator implementation with label-extraction strategy and unit tests.
src/formats/mod.rs Exposes nt module + NTriples re-export; extends FormatError with N-Triples-related variants.
AGENTS.md Documents the new nt.rs format module and usage details.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +214 to +231
[`NTriples<R>`](src/formats/nt.rs:57) parses [W3C N-Triples](https://www.w3.org/TR/n-triples/)
RDF files using `oxttl`. Each triple `(subject, predicate, object)` becomes an
[`Edge`](src/graph/mod.rs:154) where:

- `source` — subject IRI or blank-node ID (`_:label`).
- `target` — object IRI or blank-node ID; triples whose object is an RDF
literal yield `Err(FormatError::LiteralAsNode)` (callers may filter these out).
- `label` — predicate IRI, transformed by [`LabelExtraction`](src/formats/nt.rs:36):

| Variant | Behaviour |
|---|---|
| `LocalName` (default) | Fragment (`#name`) or last path segment of the predicate IRI |
| `FullIri` | Full predicate IRI string |

Constructors:

- [`NTriples::new(reader)`](src/formats/nt.rs:72) — uses `LabelExtraction::LocalName`.
- [`NTriples::with_label_extraction(reader, strategy)`](src/formats/nt.rs:76) — explicit strategy.
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several Markdown source links here use hard-coded line numbers that don’t match the newly added src/formats/nt.rs (e.g., NTriples<R> is at line 64, LabelExtraction at 38, and the constructors at 70/74). Updating these references will prevent broken links in GitHub’s rendered documentation.

Copilot uses AI. Check for mistakes.

/// An RDF literal appeared as a subject or object where a node IRI or
/// blank node was expected.
#[error("RDF literal cannot be used as a graph node (triple skipped)")]
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FormatError::LiteralAsNode's display text says “(triple skipped)”, but nothing in the iterator / GraphBuilder::load path actually skips the triple—item? will propagate the error and abort loading unless the caller filters it out. Either remove the “triple skipped” wording from the error message, or implement default skipping at the GraphSource/builder layer for this specific variant so the message matches behavior.

Suggested change
#[error("RDF literal cannot be used as a graph node (triple skipped)")]
#[error("RDF literal cannot be used as a graph node")]

Copilot uses AI. Check for mistakes.
#[error("N-Triples parse error: {0}")]
NTriples(String),

/// An RDF literal appeared as a subject or object where a node IRI or
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc comment for LiteralAsNode says “subject or object”, but in RDF/N-Triples a literal cannot appear in the subject position (and the parser type for subject here is NamedOrBlankNode). Consider adjusting the wording to just “object” (or “term position where an IRI/blank node was expected”) to avoid implying an impossible case.

Suggested change
/// An RDF literal appeared as a subject or object where a node IRI or
/// An RDF literal appeared in a term position where an IRI or

Copilot uses AI. Check for mistakes.
Comment on lines +96 to +108
fn extract_label(iri: &str, strategy: &LabelExtraction) -> String {
match strategy {
LabelExtraction::FullIri => iri.to_owned(),
LabelExtraction::LocalName => {
// Fragment takes priority, then last path segment.
if let Some(pos) = iri.rfind('#') {
iri[pos + 1..].to_owned()
} else if let Some(pos) = iri.rfind('/') {
iri[pos + 1..].to_owned()
} else {
iri.to_owned()
}
}
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extract_label can return an empty string for IRIs that end with # or / (e.g., http://example.org/ns#). That would create edges with an empty label and store them under the empty-string key in label_buffers, which is likely unintended. Consider falling back to the full IRI (or returning a parse error) when the computed local name is empty.

Copilot uses AI. Check for mistakes.
Comment on lines +194 to +200
impl<R: Read> GraphSource<InMemoryBuilder> for NTriples<R> {
fn apply_to(self, mut builder: InMemoryBuilder) -> Result<InMemoryBuilder, GraphError> {
for item in self {
builder.push_edge(item?)?;
}
Ok(builder)
}
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GraphSource impl for NTriples<R> duplicates the same for item in self { builder.push_edge(item?)?; } loop used by the existing Csv<R> impl. Since InMemoryBuilder already has with_stream (which handles IntoIterator<Item = Result<Edge, E>>), consider delegating to builder.with_stream(self) to keep the loading logic in one place and avoid divergence between format loaders over time.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RDF N-Triples graph data loading

4 participants